Model Selection

Visual Reasoning

# Visual Reasoning

Nvidia.cosmos Reason1 7B GGUF

Cosmos-Reason1-7B is a 7B-parameter foundational model released by NVIDIA, specializing in image-to-text tasks.

Large Language Model

A vision-language model based on ViLT architecture, fine-tuned specifically for GQA visual reasoning tasks

VL Rethinker 7B 6bit

This is a multimodal model based on Qwen2.5-VL-7B-Instruct, supporting visual question answering tasks, converted to MLX format for efficient operation on Apple chips.

Transformers English

VL Rethinker 72B 8bit

This model is a multimodal vision-language model converted from Qwen2.5-VL-7B-Instruct, supporting 8-bit quantization and suitable for visual question-answering tasks.

Transformers English

Idefics3 8B Llama3

Idefics3 is an open-source multimodal model capable of processing arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.

Transformers English

ChartQA is a visual question answering model focused on extracting information from charts and answering related questions.

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase